Improving Linpack Performance on SMP Clusters with Asynchronous MPI Programming

نویسندگان

  • Ta Quoc Viet
  • Tsutomu Yoshinaga
چکیده

This study proposes asynchronous MPI, a simple and effective parallel programming model for SMP clusters, to reimplement the High PerformanceLinpack benchmark. The proposed model forces processors of an SMP node to work in different phases, thereby avoiding unneccessary communication and computation bottlenecks. As a result, we can achieve significant improvements in performance with a minimal programming effort. In comparison with a de-facto flat MPI solution, our algorithm can yield a 20.6% performance improvement for a 16-node cluster of Xeon dual-processor SMPs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Asynchronous Parallel Programming Model for SMP Clusters

Our study proposes a novel MPI-only parallel programming model with improved performance for SMP clusters. By rescheduling tasks in a typical flat MPI solution, our model forces processors of an SMP node to work in different phases, thereby avoiding unneccessary communication and computation bottlenecks. This study achieves a significant performance improvement with a minimal programming effort...

متن کامل

Performance Impact of Process Mapping on Small-Scale SMP Clusters - A Case Study Using High Performance Linpack

Typically, a High Performance Computing (HPC) cluster loosely couples multiple Symmetric MultiProcessor (SMP) platforms into a single processing complex. Each SMP uses shared memory for its processors to communicate, whereas communication across SMPs goes through the intra-cluster interconnect. By analyzing the communication pattern of processes, it is possible to arrive at a mapping of process...

متن کامل

Optimization for Hybrid MPI-OpenMP Programs on a Cluster of SMP PCs

This paper applies a Hybrid MPI-OpenMP programming model with a thread-to-thread communication method on a cluster of Dual Intel Xeon Processor SMPs connected by a Gigabit Ethernet network. The experiments include the well-known HPL and CG benchmarks. We also describe optimization techniques to get a high cache hit ratio with the given architecture. As a result, the hybrid model shows performan...

متن کامل

Performance Characteristics of Intel Architecture — based Servers

Computing clusters built from standard components using Intel® processors are becoming the fastest growing choice for high-performance computing (HPC). Twice yearly, the 500 most powerful computing systems in the world are ranked on the TOP500 Supercomputer Sites Web page. In November 2002, the ranking listed 56 entries using Intel processors; by June 2003, that number reached 119. Today, three...

متن کامل

A Scalable Asynchronous Replication-Based Strategy for Fault Tolerant MPI Applications

As computational clusters increase in size, their mean-time-to-failure reduces. Typically checkpointing is used to minimize the loss of computation. Most checkpointing techniques, however, require a central storage for storing checkpoints. This severely limits the scalability of checkpointing. We propose a scalable replication-based MPI checkpointing facility that is based on LAM/MPI. We extend...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006